[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

AndSonder · 2023-10-23T06:59:32Z

PR types

Others

PR changes

Others

Description

静态图模式下可视化流水并行时序图

静态图模式下自动并行的运行将调用C++端 StandaloneExecutor::Run ，该方法中将顺序执行提前拆分好的Job。本PR的主要目的是将不同设备上Job的运行时序图可视化出来并使用 Chrome::tracing 查看

如何使用?

以下以使用 test_pipeline_scheduler 单侧生成日志文件并生成可视化时序图为例进行说明：

由于单侧默认会清空掉生成的日志文件，我们需要先将清空日志的逻辑删除并指定log文件夹：

class TestFThenBPass(unittest.TestCase):
    def test_pp2(self):
        file_dir = os.path.dirname(os.path.abspath(__file__))
        launch_model_path = os.path.join(
            file_dir, "pipeline_scheduler_unittest.py"
        )

        if os.environ.get("WITH_COVERAGE", "OFF") == "ON":
            coverage_args = ["-m", "coverage", "run", "--branch", "-p"]
        else:
            coverage_args = []

        # tmp_dir = tempfile.TemporaryDirectory()
        cmd = (
            [sys.executable, "-u"]
            + coverage_args
            + [
                "-m",
                "paddle.distributed.launch",
                "--devices",
                "0,1",
                "--log_dir",
                "/home/root/Paddle/build/Testing/Temporary",
                launch_model_path,
            ]
        )

        process = subprocess.Popen(cmd)
        process.wait()
        self.assertEqual(process.returncode, 0)

        # tmp_dir.cleanup()

1、在开启FLAG的前提下，运行训练过程并生成log

FLAGS_auto_parallel_profiler=1 GLOG_v=0 ctest -R test_pipeline_scheduler $VV

GLOG_v=0 的目的是产生尽可能少的日志，降低正则匹配的时间

2、运行 profiler_helper_static.py 生成json文件

3、使用 Chrome Tracing 打开json文件

也可以使用 perfetto 打开 pipeline_profile_perfetto.json

相关PR:

[AutoParallel] add pipeline.auto_parallel_profiler to auto_config PaddleNLP#7343

paddle-bot · 2023-10-23T06:59:37Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… add_profiler

From00 · 2023-10-25T13:21:30Z

paddle/fluid/framework/new_executor/interpreter_base_impl.h

@@ -38,6 +38,10 @@
 #include "paddle/fluid/platform/device_event.h"
 #include "paddle/phi/backends/device_manager.h"

+#if defined(PADDLE_WITH_CUDA)
+#include "paddle/phi/kernels/autotune/gpu_timer.h"


gpu_timer只与类接口具体的实现方式相关，与类定义无关，应只在使用到的.cc文件中include，而不在基类头文件中include

From00 · 2023-10-25T13:24:49Z

paddle/fluid/framework/new_executor/program_interpreter.cc

@@ -103,6 +104,16 @@ ProgramInterpreter::~ProgramInterpreter() {
 }

 void ProgramInterpreter::RunImpl() {
+#if defined(PADDLE_WITH_CUDA)
+  if (FLAGS_auto_parallel_profiler) {
+    // Note(sonder): Record the start time of the each stream.


NOTE一般用于解释一些复杂、难以阅读的代码，或提示一些从代码中无法表达的信息。这几行代码非常简单直接，这个NOTE也只是把代码重复讲一遍，可以不需要。

From00 · 2023-10-25T13:36:38Z

paddle/fluid/framework/new_executor/program_interpreter.cc

+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+  stream_timers_.clear();
+  std::vector<gpuStream_t> streams;
+  bool has_default_stream = false;


Paddle框架不会使用空流，不需要处理空流的情况。

From00 · 2023-10-25T13:40:04Z

paddle/phi/kernels/autotune/gpu_timer.h

+  void Start() {
+    struct timeval time_now {};
+    gettimeofday(&time_now, nullptr);
+    start_time_ = (time_now.tv_sec * 1000) + (time_now.tv_usec / 1000.0);


这里可以加注释说明为何需要用CPU时间作为start_time

From00 · 2023-10-25T13:45:06Z

paddle/fluid/framework/new_executor/standalone_executor.cc

+      double start_time, end_time;
+      std::tie(start_time, end_time) =
+          interpretercores_[job_idx]->InterpreterRunTime();
+      VLOG(0) << "Profiler Info: Job (" << job_idx << "), type = " << job_type


这里加注释说明这个log的作用，否则其它人不了解的情况下可能错误改动

From00 · 2023-10-25T13:46:52Z

python/paddle/distributed/fleet/meta_parallel/pp_utils/profiler_helper_static.py

@@ -0,0 +1,117 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


这个脚本可以放在distributed/auto_parallel/static/下

… add_profiler

From00 · 2023-11-13T06:02:32Z

paddle/fluid/framework/new_executor/interpreter_base_impl.h

-      const std::vector<std::string>& feed_names, bool need_fetch = true) = 0;
+      const std::vector<std::string>& feed_names,
+      bool need_fetch = true,
+      bool enable_auto_parallel_profiler = false) = 0;


Suggested change

bool enable_auto_parallel_profiler = false) = 0;

bool enable_job_schedule_profiler = false) = 0;

From00 · 2023-11-13T06:03:35Z

paddle/fluid/framework/new_executor/interpretercore.cc

@@ -34,6 +34,10 @@ PADDLE_DEFINE_EXPORTED_bool(new_executor_use_local_scope,
                            true,
                            "Use local_scope in new executor(especially used "
                            "in UT), can turn off for better performance");
+PADDLE_DEFINE_EXPORTED_bool(auto_parallel_profiler,


为什么还需要这个FLAGS？

From00 · 2023-11-13T06:06:16Z

paddle/fluid/framework/new_executor/program_interpreter.cc

+  enable_auto_parallel_profiler_ = enable_auto_parallel_profiler;
+
+  if (enable_auto_parallel_profiler_) {
+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)


编译控制宏应该在条件判断外层，否则在宏条件不成立的情况下就会出现

if (enable_auto_parallel_profiler_) { 空白 }

这种奇怪的代码

编译控制宏应该在条件判断外层，否则在宏条件不成立的情况下就会出现

if (enable_auto_parallel_profiler_) { 空白 }

这种奇怪的代码

已修改

From00 · 2023-11-13T06:07:50Z

paddle/fluid/framework/new_executor/program_interpreter.cc

+
+  if (enable_auto_parallel_profiler_) {
+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+    gpuStream_t calculated_stream =


是否有必要在每次run的时候都重复获取和设置相同的计算流？可否在CalculateStreamTimer构造时内部自动获取计算流，而不需要外部调用方设置？

是否有必要在每次run的时候都重复获取和设置相同的计算流？可否在CalculateStreamTimer构造时内部自动获取计算流，而不需要外部调用方设置？

已修改，改为了创建的时候传入place_，然后在内部创建好计算流

From00 · 2023-11-13T06:12:20Z

paddle/fluid/framework/new_executor/program_interpreter.h

@@ -211,6 +219,12 @@ class ProgramInterpreter : public InterpreterBaseImpl {
  InstructionSchedulingPriorityLess instruction_scheduling_priority_less;

  std::vector<HookFunc> hookfuncs_;
+
+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+  phi::CalculatedStreamTimer calculated_stream_timer_;


Suggested change

phi::CalculatedStreamTimer calculated_stream_timer_;

phi::CalculatedStreamTimer calculate_stream_timer_;

From00 · 2023-11-13T06:13:52Z

paddle/fluid/framework/new_executor/program_interpreter.h

+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+  phi::CalculatedStreamTimer calculated_stream_timer_;
+#endif
+  size_t last_calculated_instr_id;


Suggested change

size_t last_calculated_instr_id;

size_t last_calculate_instr_id_;

From00 · 2023-11-13T06:22:19Z

paddle/fluid/framework/new_executor/program_interpreter.cc

@@ -1040,6 +1063,15 @@ void ProgramInterpreter::RunInstruction(const Instruction& instr_node) {

  try {
    instr_node.WaitEvent(place_);
+    if (enable_auto_parallel_profiler_) {
+#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
+      if (!interpreter::IsCommunicationOp(instr_node) &&


!calculated_stream_timer_.IsStarted()只是一个简单的标志判断，且对大多数算子的情况都是False，而!interpreter::IsCommunicationOp(instr_node)有许多代码判断逻辑。这种情况应该将!calculated_stream_timer_.IsStarted()作为&&语句第一个判断命题，从而借助C++短路机制减少!interpreter::IsCommunicationOp(instr_node)的实际调用次数，提升代码性能。

From00 · 2023-11-13T07:05:43Z

python/paddle/distributed/auto_parallel/constants.py

@@ -114,6 +114,7 @@ def set_field_default_config(category, field, default_value):
 set_field_default_config(PIPELINE, "accumulate_steps", 1)
 set_field_default_config(PIPELINE, "generation_batch_size", 1)
 set_field_default_config(PIPELINE, "enable_send_recv_overlap", False)
+set_field_default_config(PIPELINE, "schedule_profiler", False)


这里看起来只能通过开关控制是否开启profiler，无法指定采样区间。是否可以支持直接设置pipeline.schedule_profiler_start和pipeline.schedule_profiler_end，默认[-1, -1）表示不开启，否则在[start, end)之间开启profiler，并在end-1个step之后退出整个任务的运行。

这里看起来只能通过开关控制是否开启profiler，无法指定采样区间。是否可以支持直接设置pipeline.schedule_profiler_start和pipeline.schedule_profiler_end，默认[-1, -1）表示不开启，否则在[start, end)之间开启profiler，并在end-1个step之后退出整个任务的运行。

现在已经有结合 Profiler_auto.nvprof_start 和 Profiler_auto.nvprof_end 来控制采样区间的代码了，在PaddleNLP的 PR里面：

[AutoParallel] add pipeline.auto_parallel_profiler to auto_config PaddleNLP#7343

… add_profiler

From00

LGTM

… graph mode (PaddlePaddle#58313) * merge from openvino master * add InterpreterRunTime() to record interpreter's run time * add profiler helper static to produce json file * add color map and support perfetto format * recover codes * control include env for gpu_timer.h * fix logic for profiler_helper_static.py * fix build error * fix build error * recover thirdparty * add flag control: not support new ir now * set auto_parallel_profiler flag to false * fix * add auto_parallel_profiler as command parameter * fix value name * support gettimeofday for win env * fix win build error * fix win build error * use job_type_to_id * Fixed repeatedly timing the same stream * add step line for timeline * add step timeline and fix logic when job overlap * update time record logic * fix bug when start profile start from none zero step * fix note * remove FLAGS_auto_parallel_profiler * use run config instead FLAGS_auto_parallelxx * fix color map logic * fix color map logic * fix bug when log step does not start from 0 * fix * fix * don't use set_enable_auto_parallel_profiler * fix bug * disable auto_parallel_profiler when not open flag by command line * fix bug * remove resettime * fix build bug * fix * remove set enable * fix build error * fix build error * fix build error * fix ci error * fix * fix run error * fix * fix * fix calculate_stream_timer logic * remove fluid head * fix build error * set default value for enable_job_schedule_profiler

AndSonder added 4 commits October 18, 2023 04:43

merge from openvino master

c514fbd

add InterpreterRunTime() to record interpreter's run time

0147f70

add profiler helper static to produce json file

6d1dc3d

add color map and support perfetto format

6f4f67c

paddle-bot bot added the contributor External developers label Oct 23, 2023

AndSonder added 9 commits October 23, 2023 07:02

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

14fd116

… add_profiler

recover codes

4d51610

control include env for gpu_timer.h

c70d9f9

fix logic for profiler_helper_static.py

ad0f17a

fix build error

e0442c6

fix build error

a8a37bb

recover thirdparty

a20e6ce

add flag control: not support new ir now

3e10a6d

set auto_parallel_profiler flag to false

59b425e

AndSonder mentioned this pull request Oct 25, 2023

[WeeklyReports] 2023.10.10~2023.10.24 周报汇总 PFCCLab/Camp#34

Closed

23 tasks

From00 reviewed Oct 25, 2023

View reviewed changes

AndSonder added 10 commits October 26, 2023 06:40

fix

ddc5038

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

14f6228

… add_profiler

add auto_parallel_profiler as command parameter

1dfc816

fix value name

9f271ef

support gettimeofday for win env

dabf964

fix win build error

6ad6f36

fix win build error

d58cc94

use job_type_to_id

e9886ae

Fixed repeatedly timing the same stream

282285b

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3b0db0c

… add_profiler

AndSonder changed the title ~~Visualize flow parallel timing diagram in static graph mode~~ [feat] Visualize flow parallel timing diagram in static graph mode Oct 31, 2023

AndSonder mentioned this pull request Oct 31, 2023

[AutoParallel] add pipeline.auto_parallel_profiler to auto_config PaddlePaddle/PaddleNLP#7343

Merged

AndSonder changed the title ~~[feat] Visualize flow parallel timing diagram in static graph mode~~ [feat][AutoParallel] Visualize flow parallel timing diagram in static graph mode Oct 31, 2023

AndSonder added 2 commits November 10, 2023 03:17

remove resettime

5bb55e1

fix build bug

f422b33

From00 reviewed Nov 13, 2023

View reviewed changes

AndSonder added 6 commits November 13, 2023 09:59

fix

ed5f7fc

remove set enable

718cf17

fix build error

f36b57b

fix build error

444b7a7

fix build error

f494916

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

28f089f

… add_profiler

PaddlePaddle locked and limited conversation to collaborators Nov 15, 2023

PaddlePaddle unlocked this conversation Nov 15, 2023

AndSonder added 10 commits November 15, 2023 14:44

fix ci error

a2b5988

fix

fb748d9

fix run error

aa5570d

fix

6b18e10

fix

f096253

fix calculate_stream_timer logic

560fb61

remove fluid head

bbb3071

fix build error

e15c19e

set default value for enable_job_schedule_profiler

989348c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

10b84d8

… add_profiler

AndSonder mentioned this pull request Nov 20, 2023

[AutoParallel] Support multi machine case for the visualize tool #59179

Merged

From00 approved these changes Nov 21, 2023

View reviewed changes

From00 merged commit 192a5f8 into PaddlePaddle:develop Nov 21, 2023
28 checks passed

This was referenced Nov 21, 2023

[AutoParallel][PIR] Support new ir for the visualize tool #59195

Merged

[WeeklyReports] 2023.11.08~2023.11.21 周报汇总 PFCCLab/Camp#77

Closed

AndSonder changed the title ~~[feat][AutoParallel] Visualize flow parallel timing diagram in static graph mode~~ [AutoParallel] Visualize flow parallel timing diagram in static graph mode Dec 7, 2023

AndSonder mentioned this pull request Dec 7, 2023

WAVE SUMMIT+2023下半年飞桨开源之星评选-信息征集 PaddlePaddle/community#765

Closed

AndSonder deleted the add_profiler branch April 23, 2024 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

AndSonder commented Oct 23, 2023 •

edited

Loading

paddle-bot bot commented Oct 23, 2023

From00 Oct 25, 2023

From00 Oct 25, 2023

From00 Oct 25, 2023

From00 Oct 25, 2023

From00 Oct 25, 2023

From00 Oct 25, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023

From00 Nov 13, 2023

AndSonder Nov 13, 2023 •

edited

Loading

From00 left a comment

		@@ -0,0 +1,117 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

	bool enable_auto_parallel_profiler = false) = 0;
	bool enable_job_schedule_profiler = false) = 0;

	phi::CalculatedStreamTimer calculated_stream_timer_;
	phi::CalculatedStreamTimer calculate_stream_timer_;

	size_t last_calculated_instr_id;
	size_t last_calculate_instr_id_;

[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

Conversation

AndSonder commented Oct 23, 2023 • edited Loading

PR types

PR changes

Description

如何使用?

paddle-bot bot commented Oct 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndSonder Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

From00 left a comment

Choose a reason for hiding this comment

AndSonder commented Oct 23, 2023 •

edited

Loading

AndSonder Nov 13, 2023 •

edited

Loading